Toward Simplifying and Accurately Formulating Fragment Assembly

نویسنده

  • Eugene W. Myers
چکیده

The fragment assembly problem is that of reconstructing a DNA sequence from a collection of randomly sampled fragments. Traditionally, the objective of this problem has been to produce the shortest string that contains all the fragments as substrings, but in the case of repetitive target sequences this objective produces answers that are overcompressed. In this paper, the problem is reformulated as one of finding a maximum-likelihood reconstruction with respect to the two-sided Kolmogorov-Smirnov statistic, and it is argued that this is a better formulation of the problem. Next the fragment assembly problem is recast in graph-theoretic terms as one of finding a noncyclic subgraph with certain properties and the objectives of being shortest or maximally likely are also recast in this framework. Finally, a series of graph reduction transformations are given that dramatically reduce the size of the graph to be explored in practical instances of the problem. This reduction is very important as the underlying problems are NP-hard. In practice, the transformed problems are so small that simple branch-and-bound algorithms successfully solve them, thus permitting auxiliary experimental information to be taken into account in the form of overlap, orientation, and distance constraints.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scheduling of Multiple Autonomous Guided Vehicles for an Assembly Line Using Minimum Cost Network Flow

This paper proposed a parallel automated assembly line system to produce multiple products having multiple autonomous guided vehicles (AGVs). Several assembly lines are configured to produce multiple products in which the technologies of machines are shared among the assembly lines when required. The transportation between the stations in an assembly line (intra assembly line) and among station...

متن کامل

A chance-constrained multi-objective model for final assembly scheduling in ATO systems with uncertain sub-assembly availability

A chance-constraint multi-objective model under uncertainty in the availability of subassemblies is proposed for scheduling in ATO systems. The on-time delivery of customer orders as well as reducing the company's cost is crucial; therefore, a three-objective model is proposed including the minimization of1) overtime, idletime, change-over, and setup costs, 2) total dispersion of items’ deliver...

متن کامل

Assembling DNA Fragments with a Distributed Genetic Algorithm

As more research centers embark on sequencing new genomes, the problem of DNA fragment assembly for shotgun sequencing is growing in importance and complexity. Accurate and fast assembly is a crucial part of any sequencing project and many algorithms have been developed to tackle it. Since the DNA fragment assembly problem is NP-hard, exact solutions are very difficult to obtain. Various heuris...

متن کامل

Clustering of Short Read Sequences for de novo Transcriptome Assembly

Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...

متن کامل

Matrix Algebra Problems with Applications in Psychology and Multivariate Analysis

Students in Multivariate Analysis need to develop skills in working with matrices and reading matrix expressions. Some of the skills which seem to be important are: 1. Performing simple matrix calculations 2. Manipulating algebraic expressions in matrices—substituting identities, simplifying expressions, etc. 3. Formulating substantive and statistical problems in matrix terms—or, recognizing su...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of computational biology : a journal of computational molecular cell biology

دوره 2 2  شماره 

صفحات  -

تاریخ انتشار 1995